Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning
نویسندگان
چکیده
In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no methods exist for determining high-confidence safety bounds for a given evaluation policy in the inverse reinforcement learning setting—where the true reward function is unknown and only samples of expert behavior are given. We propose a method based on Bayesian Inverse Reinforcement Learning that uses demonstrations to determine practical high-confidence bounds on the difference in expected return between any evaluation policy and the expert’s underlying policy. A samplingbased approach is used to obtain probabilistic confidence bounds using the financial Value at Risk metric. We empirically evaluate our proposed bound on a standard navigation task for a wide variety of ground truth reward functions. Empirical results demonstrate that our proposed bound provides significant improvements over a standard feature count-based approach: providing accurate, tight bounds even for small numbers of noisy demonstrations.
منابع مشابه
Probabilistic inverse reinforcement learning in unknown environments
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or...
متن کاملNonlinear Inverse Reinforcement Learning with Gaussian Processes
We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussian processes to learn the reward as a nonli...
متن کاملToward Probabilistic Safety Bounds for Robot Learning from Demonstration
Learning from demonstration is a popular method for teaching robots new skills. However, little work has looked at how to measure safety in the context of learning from demonstrations. We discuss three different types of safety problems that are important for robot learning from human demonstrations: (1) using demonstrations to evaluate the safety of a robot’s current policy, (2) using demonstr...
متن کاملImproving Hybrid Vehicle Fuel Efficiency Using Inverse Reinforcement Learning
Deciding what mix of engine and battery power to use is critical to hybrid vehicles’ fuel efficiency. Current solutions consider several factors such as the charge of the battery and how efficient the engine operates at a given speed. Previous research has shown that by taking into account the future power requirements of the vehicle, a more efficient balance of engine vs. battery power can be ...
متن کاملProbabilistic Reasoning through Genetic Algorithms and Reinforcement Learning
In this paper, we develop an efficient approach for inferencing over Bayesian etworks by using a reinforcement learning controller to direct a genetic algorithm. The random variables of a Bayesian network can be grouped into several sets reflecting the strong probabilistic correlations between random variables in the group. We build a reinforcement learning controller to identify these groups a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.00724 شماره
صفحات -
تاریخ انتشار 2017